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We present a detailed examination of the heavy flavor properties of jets 
produced at the Fermilab Tevatron colhder. The data set, collected with 
the Collider Detector at Fermilab, consists of events with two or more jets 
with transverse energy Et > 15 GeV and pseudo-rapidity \'q\ < 1.5. The 
heavy flavor content of the data set is enriched by requiring that at least one 
of the jets (lepton-jet) contains a lepton with transverse momentum larger 
than 8 GeV/c. Jets containing hadrons with heavy flavor are selected via the 
identification of secondary vertices. The parton-level cross sections predicted 
by the HERWIG Monte Carlo generator program are tuned within theoretical 
and experimental uncertainties to reproduce the secondary-vertex rates in 
the data. The tuned simulation provides new information on the origin of 
the discrepancy between the bb cross section measurements at the Tevatron 
and the next-to-leading order QCD prediction. We also compare the rate 
of away-jets (jets recoiling against the lepton-jet) containing a soft lepton 
{pt > 2 GeV/c) in the data to that in the tuned simulation. We find that 



this rate is larger than what is expected for the conventional production and 
semileptonic decay of pairs of hadrons with heavy flavor. 
PACS number(s): 13.85.Qk, 13.20.He, 13.20.Fc 

I. INTRODUCTION 

This paper presents a study of semileptonic decays in jets containing heavy flavor and is 
motivated by several anomalies that have been previously reported. CDF has found the rate 
of jets with both a secondary vertex and a soft lepton (superjets) to be larger than expected 
in the W+ 2,3 jet sample. The kinematical properties of the events with a superjet are 
difficult to reconcile with the standard model (SM) expectation [1]. 

The discrepancy between the single bottom quark cross section measurements at the 
Tevatron and the next-to-leading order (NLO) QCD prediction [2] can be explained either 
in terms of new physics [3] or by the lack of robustness of the NLO prediction [4] . However, 
at the Tevatron, there are two additional discrepancies between the measured and predicted 
value of the bb cross section that are more difficult to accommodate within the theoretical 
uncertainty. In Ref. [6], the correlated ii+b-jet cross section is measured to be 1.5 times larger 
than af,i x BR, where BR is the average semileptonic branching ratio of 6-hadrons produced 
at the Tevatron and af,i is the NLO prediction of the cross section for producing pairs of b 
and b quarks. A further discrepancy is found by both CDF and D^ experiments [7,8] when 
comparing the cross section for producing dimuons from 6-hadron semileptonic decays to 
(jfeb X BR^. The value of cr^B x BR^ is found to be approximately 2.2 times larger than the 
NLO prediction ^. There are possible conventional explanations presented in the literature 



In both measurements, a^g is the cross section for producing two central bottom quarks, both 
with transverse momentum approximately larger than 10 GeV/c. In this case, the LO and NLO 
predictions are equal within a few percents, and the NLO prediction changes by no more than 15% 
when changing the renormaUzation and factorization scales by a factor of two [5] . 
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for these anomalies [9,10]. 

However, all these discrepancies could also be mitigated by postulating the existence 
of a light strong-interacting object with a 100% semileptonic branching ratio. Since 
there are no limits to the existence of a charge— 1/3 scalar quark with mass smaller than 
7.4 GeV/c^ [11-13], the supersymmetric partner of the bottom quark is a potential candi- 
date. This paper presents an analysis of multi-jet data intended to search for evidence either 
supporting or disfavoring this hypothesis. 

The strategy of this search is outhned in Sec. II. Section III describes the detector systems 
relevant to this analysis, while the sample selection and the tagging algorithms (SECVTX 
and JPB) used to select heavy flavors are discussed in Sec. IV. Section V describes the 
data sample composition and the heavy flavor simulation. The data set consists of events 
with two or more jets with transverse energy Et > 15 GeV and contained in the silicon 
microvertex detector (SVX) acceptance. The sample is enriched in heavy flavor by requiring 
that at least one of the jets contains a lepton with pr > S GeV/c. We use measured rates 
of SECVTX and JPB tags to determine the bottom and charmed content of the data; we 
then tune the simulation to match the heavy-flavor content of the data. The evaluation of 
the number of SECVTX and JPB tags due to heavy flavor in the data and the simulation is 
described in Sec. VI and VII, respectively. The tuning of the heavy flavor production cross 
sections in the simulation is described in Sec. VIII. In Sec. IX we measure the yields of jets 
containing soft leptons (px > 2 GeV/c), and compare them to the prediction of the tuned 
simulation. Section X contains cross-checks and a discussion of the systematic uncertainties. 
Our conclusions are presented in Sec. XI. 

II. PROBING THE PRODUCTION OF LIGHT SCALAR QUARKS WITH A 
LARGE SEMILEPTONIC BRANCHING RATIO 

In previous publications [1,14] we have compared the b- and c-quark content of several 
samples of generic-jet data to the QCD prediction of the standard model using the HERWIG 



6 



generator program [15]. We identify (tag) jets produced by heavy quarks using the CDF 
sihcon micro- vertex detector (SVX) to locate secondary vertices produced by the decay of b 
and c hadrons inside a jet. These vertices (SECVTX tags) are separated from the primary 
event vertex as a result of the long b and c lifetime. We also use track impact parameters to 
select jets with a small probability of originating from the primary vertex of the event (JPB 
tags) [16]. 

In Ref. [14] we have compared rates of SECVTX and JPB tags in generic-jet data and 
their simulation first to calibrate the efficiency of the tagging algorithms in the simulation, 
and then to tune the heavy flavor cross sections evaluated with the HERWIG parton shower 
Monte Carlo. In the simulation, jets with heavy flavor are produced by heavy quarks in 
the initial or final state of the hard scattering (flavor excitation and direct production, 
respectively) or from gluons branching into bb or cc pairs (gluon splitting). The fraction of 
generic-jet data containing bb or cc pairs calculated by HERWiG models correctly the observed 
rate of tags after minor adjustments within the theoretical and experimental uncertainties. 
In Refs. [1,14], we have extended this comparison to W+ jet events. We find again good 
agreement between the observed rates of SECVTX and JPB tags and the SM prediction, 
which includes single and pair production of top quarks. 

We also identify heavy flavors by searching jets for leptons (e or /x) produced in the 
decay of b and c hadrons [1,14]; we refer to these as soft lepton tags (SLT). As shown in 
Refs. [1,14], rates of SLT tags in generic-jet data and in W+ jet events are generally well 
modeled by the simulation. An exception is the rate of SECVTX-I-SLT tags in the same jet 
(called supertags in Ref. [1]) that, in W+ 2,3 jet events, is larger than in the simulation, 
whereas, in generic-jet samples, is slightly overpredicted by the same simulation. 

This analysis uses two data samples, referred to as the signal or inclusive lepton sample 
and the control or generic-jet sample. The signal sample consists of events with two or 
more jets that have been acquired with the trigger request that events contain a lepton with 
Pt ^ 8 GeV/c. The request of a jet containing a lepton (lepton-jet) enriches the heavy flavor 
content of the sample with respect to generic jets. The control or generic-jet sample is the 
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same sample studied in Refs. [1,14], and consists of events with one or more jets acquired 
with three trigger thresholds of 20, 50 and 100 GeV, respectively. 

In the signal sample, we study jets recoiling against the lepton-jet (away-jets) and we 
perform three measurements: we count the number of away-jets that contain a lepton (SLT 
tag); that contain an SLT tag and a SECVTX tag; that contain an SLT tag and a JPB 
tag. The latter two are referred to as supertags. We compare the three measurements to a 
Monte Carlo simulation which is tuned and normalized to the data by equalizing numbers 
of SECVTX and JPB tags. The normalization and tuning procedure serves two purposes: 
it removes the dependence on the efficiency for finding the trigger lepton and ensures that 
the simulation reproduces the heavy-flavor content of the data, respectively. To calibrate 
the efficiency for finding SLT tags or supertags in the simulation, we use rates of SLT tags 
and supertags in generic-jet data (control sample). In Ref. [1], we have compared these 
measurements to a Monte Carlo simulation which was also tuned and normalized to generic- 
jet data by equalizing numbers of SECVTX and JPB tags. These three comparisons are 
used to verify the simulated efficiency for finding SLT tags, and to empirically calibrate the 
efficiency for finding supertags in the simulation. 

This analysis strategy is motivated by the following argument. If low-mass bottom 
squarks existed, they would be produced copiously at the Tevatron. The NLO calculation 
of the process pp — > bb*, implemented in the prospino Monte Carlo generator [17], predicts 
a cross section which is ~ 15% of the NLO prediction for the production cross section 
of quarks with the same mass [5]. In Rcf. [14], we have tuned, within the theoretical and 
experimental uncertainties, the heavy fiavor production cross sections calculated by HERWIG 
to reproduce the rates of SECVTX and JPB tags observed in generic-jet data. However, if 
the squark lifetime is similar to that of conventional heavy flavors, we have unfortunately 
tuned the parton-level cross section evaluated by HERWiG (or the number of simulated 
SECVTX and JPB tags predicted by the simulation) to explain in terms of conventional 
processes the squark production. However, if bottom squarks have a 100% semilcptonic 
branching ratio, it is still possible to identify their presence by comparing the observed 
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number of jets containing a lepton to that expected from b and c decays. 

We illustrate the procedure used in this paper with a numeric example detailed in Table I. 
The first column is what there would be in the data in the presence of b quarks with 100% 
semileptonic BR^. The cross sections in the first column of row A represent approximately 
the different heavy flavor contributions to the generic-jet sample. The second column is 
what one would predict after having tuned a simulation, in which only b and c quarks 
are present, to reproduce the number of SECVTX and JPB tags observed in the sample 
corresponding to the first column of row A, in the assumption that b and b quarks have the 
same lifetime. In row B, we model the request that a jet contains a lepton by multiplying 
the heavy flavor cross sections by the respective semileptonic branching ratios BR. A 20% 
excess is observed. In row C, we mimic the case in which two jets contain a lepton, and 
the same analysis leads to an excess of a factor of two. Since a discrepancy that depends 
on the number of leptons could be due to a wrong simulation of the lepton-identification 
efficiency, row D presents the stratagem of tuning again the conventional heavy flavor cross 
sections for producing events with one lepton (second column in row B) to model the cross 
section contributing to events with one lepton (first column in row B) ^. Next, row E shows 
the result of requiring an additional lepton in sample D: the excess is a factor of 1.5. If one 
chooses, as we did in previous studies, to use sample B to empirically correct the simulated 



^The cross sections are predicted using the MNR [5] and PROSPINO [17] Monte Carlo generators, 
the MRS(G) set of structure functions [18], and the renormalization and factorization scales /Xq = 
Pj, + m~. We use nifj = 4.75 GeV/c^, rric = 1.5 GeV/c^, and = 3.6 GeV/c^. The cross section 
are integrated over final-state partons with px > 18 GeV/c; this threshold is used to mimic the 
generic-jet data. Bottom quarks have a 37% semileptonic branching ratio, BR, due to 6 — > i and 
b ^ I decays, whereas BR = 21% for c quarks [19]. 

^This technique also allows us to use the inclusive lepton sample that corresponds to a much 
larger integrated luminosity than that of generic-jet data. 
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efficiency for identifying a lepton, sample E will show a 30% excess. 

TABLE I. Comparison between a = BRi, x cr^^ + BRc x acc + BRj^ x cr^j», the total 
heavy-flavor production cross section (6, c, and 6) contributing to different hypothetical samples, 
and a^°^"^ = BR^ x cr'^f^'^ + BRc x Gcc, the total heavy-flavor cross section determined with a 
conventional-QCD simulation under the hypothesis that scalar quarks have the same lifetime of h 
quarks (o-^-"^™ = <t^5 + <7^*). In samples containing leptons, each cross section is also multiplied by 
the appropriate semileptonic branching ratio BR. 



Sample a (nb) a'^°'^"' (nb) ct/ct"'"''" 

A = generic jets 869 = 298 + 487 + 84 869 = 382 + 487 1.0 

B = A with one lepton 296 = 0.37 x 298 + 0.21 x 487 + 1.0 x 84 244 = 0.37 x 382 + 0.21 x 487 1.2 

C = A with twoleptons 146 = 0.372x 298 + 0.212x 487+ 1.0 x 84 74 = 0.37^x 382 + 0.212x 487 2.0 

D = B renormalized 296 = 110 + 102 + 84 296 = 194 + 102 1.0 

E = D with one lepton 146 = 0.37 x 110 + 0.21 x 102 + 1.0 x 84 93 = 0.37 x 194 + 0.21 x 102 1.5 
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III. THE CDF DETECTOR 



The events used for this analysis have been collected with the CDF detector during the 
1993 — 1995 run of the Tevatron collider at Fermilab. The CDF detector is described in detail 
in Ref. [20]. We review the detector components most relevant to this analysis. Inside the 
1.4 T solenoid the silicon microvertex detector (SVX) [21], a vertex drift chamber (VTX), 
and the central tracking chamber (CTC) provide the tracking and momentum information 
for charged particles. The CTC is a cylindrical drift chamber containing 84 measurement 
layers. It covers the pseudo-rapidity interval I77I < 1.1, where rj — — ln[tan(^/2)]. In CDF, 9 
is the polar angle measured from the proton direction, (p is the azimuthal angle, and r is is 
the radius from the beam axis (2;-axis) . The SVX consists of four layers of silicon micro-strip 
detectors, located at radii between 2.9 and 7.9 cm from the beam line, and provides spatial 
measurements in the r — (f) plane with a resolution of 13 /xm. 

Electromagnetic (CEM) and hadronic (CHA) calorimeters with projective tower geom- 
etry are located outside the solenoid and cover the pseudo-rapidity region I77I < 1.1, with 
a segmentation of Acf) = 15° and Arj = 0.11. A layer of proportional chambers (CES) is 
embedded near shower maximum in the CEM and provides a more precise measurement of 
the electromagnetic shower position. Two muon subsystems in the central rapidity region 
(1^1 < 0.6) are used for muon identification: the central muon chambers (CMU), located be- 
hind the CHA calorimeter, and the central upgrade muon chambers (CMP), located behind 
an additional 60 cm of steel. The central muon extension (CMX) covers approximately 71% 
of the solid angle for 0.6 < \ri\ < 1.0 and, in this analysis, is used only to increase the soft 
muon acceptance. 

CDF uses a three-level trigger system. At the first two levels, decisions are made with 
dedicated hardware. The information available at this stage includes energy deposited in 
the CEM and CHA calorimeters, high-p^ tracks found in the CTC by a fast track processor 
(CFT) , and track segments found in the muon subsystems. The data used in this study were 
collected using the electron and muon low-p^ triggers. The first two levels of these triggers 
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require a track with pr > 7.5 GeV/c found by the CFT. In the case of the electron trigger, 
the CFT track must be matched to a CEM cluster with transverse energy Et > S GeV. 
In the case of the muon trigger, the CFT track must be matched to a reconstructed track- 
segment in both sets of central muon detectors (CMU and CMP). 

At the third level of the trigger, the event selection is based on a version of the off-hne 
reconstruction programs optimized for speed. The lepton selection criteria used by the third 
level trigger are similar to those described in the next section. 

IV. DATA SAMPLE SELECTION AND HEAVY FLAVOR TAGGING 

Central electrons and muons that passed the trigger prerequisite are identified with the 
same criteria used to select the W+ jet sample described in Refs. [1,14]. 

Electron candidates are identified using information from both calorimeter and tracking 
detectors. We require the following: (1) the ratio of hadronic to electromagnetic energy of 
the cluster, Ehad/Eem < 0.05; (2) the ratio of cluster energy to track momentum, E/p < 1.5; 
(3) a comparison of the lateral shower profile in the calorimeter cluster with that of test- 
beam electrons, Lshr < 0.2; (4) the distance between the extrapolated track-position and the 
CES measurement in the r — and z views. Ax < 1.5 cm and Az < 3.0 cm, respectively; (5) 
a comparison of the CES shower profile with those of test-beam electrons, xitrip ^ 20; (6) 
the distance between the interaction vertex and the reconstructed track in the ;2-direction, 
2;-vertex match < 5 cm. Fiducial cuts on the electromagnetic shower position, as measured 
in the CES, are applied to ensure that the electron candidate is away from the calorimeter 
boundaries and the energy is well measured. Electrons from photon conversions are removed 
using an algorithm based on track information [14] . 

Muons are identified by requiring a match between a CTC track and track segments 
in both the CMU and CMP muon chambers. The following variables are used to separate 
muons from hadrons interacting in the calorimeter and cosmic rays: (1) an energy depo- 
sition in the electromagnetic and hadronic calorimeters characteristic of minimum ionizing 
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particles, £^em < 2 GeV and Ehad < 6 GeV, respectively; (2) Eem + Ehad > 0.1 GeV; (3) the 
distance of closest approach of the reconstructed track to the beam line in the transverse 
plane (impact parameter), d < 0.3 cm; (4) the ^-vertex match < 5 cm; (5) the distance 
between the extrapolated track and the track segment in the muon chamber. Ax — rA0 < 
2 cm. 

We select events containing at least one electron with Et > 8 GeV or one muon with 
Pt > 8 GeV/c. This selection produces a data sample quite similar to that used for the 
measurement of the — & flavor oscillation [22]. Since we are interested in semileptonic 
decays of heavy quarks, trigger leptons are also required to be non-isolated; we require 
/ > 0.1, where the isolation I is defined as the ratio of the additional transverse energy 
deposited in the calorimeter in a cone of radius it! = y/6^~+~5rf — 0.4 around the lepton 
direction to the lepton transverse energy. 

Further selection of the data sample is based upon jet reconstruction. Jets are recon- 
structed from the energy deposited in the calorimeter using a clustering algorithm with a 
fixed cone of radius R = 0.4. A detailed description of the algorithm can be found in Ref. [23]. 
Jet energies can be mismeasured for a variety of reasons (calorimeter non-linearity, loss of 
low momentum particles because of the magnetic field, contributions from the underlying 
event, out-of-cone losses, undetected energy carried by muons and neutrinos). Corrections, 
which depend on the jet Et and rj, are applied to jet energies; they compensate for these 
mismeasurements on average but do not improve the jet energy resolution. In this analysis 
we select central jets (taggable) by requiring that they include at least two SVX tracks [24]. 

We require the trigger lepton to be contained in a cone of radius R — 0.4 around the axis 
of a taggable jet with uncorrected transverse energy Et > 15 GeV. This jet will be referred 
to as lepton-jet or e-jet or /i-jet. We also require the presence of at least one additional 
taggable jet (away-jet) with Et > 15 GeV. The requirement of a non-isolated lepton inside 
a jet rejects most of the leptonic decays of vector bosons and the Drell-Yan contribution. 
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The request of two jets with Et > 15 GeV reduces the statistics of the data sample . This 
£^T-threshold is chosen because efficiencies and backgrounds of the SECVTX, JPB and SLT 
algorithms have been evaluated only for jets with transverse energy above this value [14]. 
We select 68544 events with an e-jet and 14966 events with a //-jet. 

In order to determine the bottom and charmed content of the data we use two algorithms 
(SECVTX and JPB) which have been studied in detail in Refs. [1,14]. SECVTX is based on 
the determination of the primary event vertex and the reconstruction of additional secondary 
vertices using displaced SVX tracks contained inside jets. Jet-probability (JPB) compares 
track impact parameters to measured resolution functions in order to calculate for each jet 
a probability that there are no long-lived particles in the jet cone [16]. 

The simulation of these tagging algorithms makes use of parametrizations of the detec- 
tor response for single tracks, which were derived from the data. Because of the naivety 
of the method, these algorithms have required several empirical adjustments. SECVTX 
tags not produced by hadrons with heavy flavor (mistags) are underestimated by the detec- 
tor simulation. Therefore SECVTX and JPB mistags are evaluated using a parametrized 
probability derived from generic-jet data [14], and are subtracted from the data in order to 
compare to the heavy flavor simulation. We estimate that the mistag removal has a 10% 
uncertainty [14]. 

The tagging efficiency of these algorithms is not well modeled by the parametrized simu- 
lation. In Ref. [14], we have used generic jets and a subset of the inclusive electron sample to 
determine the data-to-simulation scale factors for the tagging efficiency of these algorithms. 
The data-to-simulation scale factor of the SECVTX tagging efficiency for &-jets is measured 
to be 1.25 ± 0.08. The number of tags in the simulation is multiphed by this scale factor, 
and we add a 6% uncertainty to the prediction of tags. The data-to-simulation scale factor 



jet with uncorrected transverse energy Et = 15 GeV corresponds to a parton with average 
transverse energy < Et >— 20 GeV. 
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for c jets, has been measured to be 0.92 ± 0.28 [14]; because of its large uncertainty, this 
scale factor is not implemented into the simulation, but we add a 28% uncertainty to the 
prediction of tags due to c jets. The data-to-simulation scale factor for the jet-probability 
algorithm has been measured to be 0.96 ± 0.05. The number of tags in the simulation is 
multiplied by this scale factor, and we add a 6% uncertainty to the prediction of tags. 

In this study, we also probe the heavy-quark contribution by searching a jet for soft 
leptons (e and //) produced by the decay of hadrons with heavy flavor. The soft lepton 
tagging algorithm is apphed to sets of CTC tracks associated with jets with Et > 15 GeV 
and |?7| <2.0. CTC tracks are associated with a jet if they are inside a cone of radius 0.4 
centered around the jet axis. In order to maintain high efficiency, the lepton px threshold 
is set low at 2 GeV/c. To search for soft electrons the algorithm extrapolates each track 
to the calorimeter and attempts to match it to a CES cluster. The matched CES cluster is 
required to be consistent in shape and position with the expectation for electron showers. 
In addition, it is required that 0.7 < E/p < 1.5 and Ehad/Eem < 0.1. The track specific 
ionization {dE/dx), measured in the CTC, is required to be consistent with the electron 
hypothesis. The efficiency of the selection criteria has been determined using a sample of 
electrons produced by photon conversions [25]. 

To identify soft muons, track segments reconstructed in the CMU, CMP and CMX 
systems are matched to CTC tracks. The CMU and CMX systems are used to identify 
muons with 2 < < 3 GeV/c and Pt > 2 GeV/c, respectively. Muon candidate tracks 
with Pt ^ 3 GeV/c within the CMU and CMP fiducial volume arc required to match to 
track segments in both systems. The reconstruction efficiency has been measured using 
samples of muons from J/ip ^ IJ'^IJ'~ and Z — > /z'^//'" decays [25]. 

In the simulation, SLT tags are defined as tracks matching at generator level electrons or 
muons originating from b- or c-hadron decays (including those coming from t or ip cascade 
decays). The SLT tagging efficiency is implemented in the simulation by weighting these 
tracks with the efficiency of each SLT selection criteria measured using the data. The 
uncertainty of the SLT efficiency is estimated to be 10% and includes the uncertainty of the 
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semileptonic branching ratios [25,26]. 

Rates of fake SLT tags are evaluated using a parametrized probability, Pf, derived in 
special samples of generic-jet data, and are subtracted from the data. This parametrization 
has been derived from the probability P that a track satisfying the fiducial requirements 
produces an SLT tag. This probability is computed separately for each lepton flavor and 
detector type and is parametrized as a function of the transverse momentum and isolation of 
the track [25,26]. In Ref. [14], by fitting the impact parameter distributions of the SLT tracks 
in the same generic-jet samples used to derive the P parametrization, we have estimated 
that Pf = (0.740 ± 0.074) x P. It follows that, in generic-jet data, the probability that a 
track corresponds to a lepton arising from heavy-fiavor decays is Phf — (0.260 ± 0.074) x P. 
Since we search a jet for SLT candidates in a cone of radius of 0.4 around its axis, the 
probabilities of finding a fake SLT tag in a jet is Pp\N) = Ef=i{l - P^/\i - 1)) x P), 
where N is the number of tracks contained in the jet cone. In generic jets, the probability 
of finding an SLT tag due to heavy fiavor is Pif{N) = Eili(l - Plfii - 1)) x Plj. In Ref. 
[14], the uncertainty of the P-^"^* = P^^* -|- Pj^* parametrization has been estimated to be 
no larger than 10% by comparing its prediction to the number of SLT tags observed in 7 
additional generic-jet samples. 

The efficiency for finding supertags (SLT tags in jets with SECVTX or JPB tags) in 
the simulation is additionally corrected with a data-to-simulation scale factor, 0.85 ± 0.05, 
derived in a previous study of generic-jet data [1]. The number of simulated supertags 
is multiplied by this factor, and we add a 6% uncertainty to the prediction of supertags. 
As mentioned earlier, the simulation of the SLT algorithm uses parametrized efficiencies 
measured using samples of electrons from photon conversions and muons from J/?/) — > 
and Z — >■ lJ,~^n~ decays. Since these leptons are generally more isolated than leptons from 
heavy flavor decays, we have some evidence that the efficiency of the SLT algorithm in the 
simulation is overestimated. However, since a reduced efficiency for finding supertags could 
also be generated by a reduced efficiency of the SECVTX (JPB) algorithm in jets containing 
a soft lepton, we have chosen to correct the simulated efficiency for finding supertags, but 
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not the efficiency of tfie simulated SLT algoritfim [1] . 

V. DATA SAMPLE COMPOSITION 

Tlie lepton-jets in our sample come from three sources: 66 production, cc production, 
and light quark or gluon production in which a hadron mimics the experimental signature 
of a lepton (fake lepton). The yield of fake leptons in light jets returned by our detector 
simulation cannot be trusted, and the 66 and cc production cross sections have large exper- 
imental and theoretical uncertainties. Therefore, we use measured rates of lepton-jets with 
SECVTX and JPB tags due to heavy flavor (i.e. after mistag removal) in order to separate 
the fractions of lepton-jets due to 66 production and cc production. The simultaneous use 
of the two tagging algorithms was pioneered in Ref. [14]; it allows to separate the 6- and 
c-quark contributions because both algorithms have the same tagging efficiency for 6 jets, 
while for c jets the efficiency of the JPB algorithm is approximately 2.5 times larger than 
that of the SECVTX algorithm. The 6 and c content of away-jets is also determined with 
this method. 

The heavy flavor content of away-jets recoiling against a lepton-jet with heavy flavor 
depends on the production mechanisms (LO terms yield higher fractions of heavy flavor than 
NLO terms). Therefore, we tune the cross sections of the various production mechanisms 
predicted by the simulation to reproduce the observed number of lepton- and away-jets with 
SECVTX and JPB tags due to heavy flavor. 

The fraction Fhf of lepton-jets due to heavy flavor, before tagging, is estimated using 
the tuned simulation. The remaining fraction, (1 — F^f), of lepton-jets is attributed to fake 
leptons in light jets. The number of tags in away-jets, which recoil against a lepton-jet 
without heavy flavor, is predicted as Na-jet x {1 — Fhf) x Pgqcd-, where Na-jet is the total 
number of away-jets, and Pgqcd is the average probability of tagging away-jets that recoil 
against lepton-jets without heavy flavor. The average probability Pgqcd is estimated by 
weighting all the away-jets with a parametrized probability of finding SECVTX (or JPB) 
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tags due to heavy flavor in generic-jet data [14]. The number Na-jet x (1 — Fhf) x Pgqcd 
is subtracted from the number of tagged away-jets with heavy flavor that are used to tune 
the simulation. In Ref . [14] , this method has been cross-checked by using it also in a sample 
of data in which electrons are identifled as coming from photon conversions. The heavy- 
flavor purity of e-jets due to photon conversions (~ 8%) is depleted with respect to that 
of e-jets not due to conversions (~ 50%). The study in Ref. [14] shows that the usage of 
the probability Pgqcd allows us to model the observed rate of tagged away-jets in both the 
electron and conversion samples within a 10% statistical uncertainty. Therefore we attribute 
a 10% uncertainty to the average probability Pgqcd- 

A. Simulation of heavy flavor production and decay 

We use the herwig Monte Carlo generator ^ to describe the fraction of data in which the 
lepton-jets contain hadrons with heavy flavor. We use the MRS(G) set of parton distribution 
functions [18], and set rric — 1.5 GeV/c^ and rrib — 4.75 GeV/c^. In the generic hard parton 
scattering, hh and cc pairs are generated by HERWiG through processes of order such 
as gg — > bb (direct production). Processes of order are implemented in the generator 
through flavor excitation processes, such as gb ^ gb, or gluon splitting, in which the process 
99 ~^ 99 is followed hy g ^ bb. The HERWiG generator neglects virtual emission graphs, 
but, as all parton shower Monte Carlo generators, also includes higher than NLO diagrams. 

The bottom and charmed hadrons produced in the flnal state are decayed using the 
CLEO Monte Carlo generator (qq) [27]. At this generation level, we retain only flnal states 
which contain hadrons with heavy flavor and at least one lepton with Pt > 8 GeV/c. The 
accepted events are passed through a simulation of the CDF detector (qfl) that is based 
on parametrizations of the detector response derived from the data. After the simulation of 



^We use option 1500 of version 5.6, generic 2^2 hard scattering with pr > 13 GeV/c (see 
Appendix A in Ref. [1] for more details). 
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the CDF detector, the Monte Carlo events are treated as real data. The simulated inclusive 
electron sample has 27136 events, corresponding to a luminosity of 98.9 pb~^. The simulated 
inclusive muon sample has 7266 events, corresponding to a luminosity of 55.1 pb~^. The 
simulated samples have approximately the same luminosity as the data. 

VI. DETERMINATION OF THE RATES OF SECVTX AND JPB TAGS DUE TO 

HEAVY FLAVOR IN THE DATA 

The heavy flavor content of the data is estimated from the number of jets tagged with 
the SECVTX and JPB algorithms. The numbers of lepton-jets and away-jets in the data, 
Ni-jet and Na-jet, are listed in Table II. Ni^j^t is equal to the number of events and Na-jet 
is about 10% larger, which means that about 10% of the events have two away-jets. This 
table lists the following numbers of tags due to the presence of hadrons with heavy flavor: 

1. Tf_^^^ and T^'^.^^, the number of lepton-jets with a SECVTX and JPB tag, respectively. 

2. Tf^^j and T/^f^, the number of away-jets with a SECVTX and JPB tag, respectively 

3. DT^^^ and DT"^^^, the number of events in which the lepton-jet and one away-jet 
are both tagged by SECVTX and JPB, respectively. 

The uncertainty on the number of tags due to heavy flavor in Table II includes the 10% 
error of the mistag removal. 

Events in which the lepton-jet does not contain heavy flavor are not described by the 
heavy flavor simulation. In these events, the number of away-jets with tags due to heavy 
flavor is predicted using the average tagging probabilities Pgqcd listed in Table II. These 
probabilities are used to correct the numbers of tagged away-jets that will be used to tune 
the heavy flavor simulation. 
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TABLE II. Number of tags due to heavy flavors in the inclusive lepton data (raw counts/removed 
mistags are indicated in parenthesis). Pqqcd is the probabihty of tagging away-jets recoihng 
against lepton-jets without heavy flavor. 



Electron data Muon data 



Tag type 






Pgqcd 






Pgqcd 


Ni ■ + 


68544 






14966 






■^a—jet 


73335 






16460 






rpSEC 


10115.3 lb 101.7 


(10221/105.7) 




3657.3 ± 60.8 


(3689/31.7) 




rpJPB 

^l-jet 


11165.4 ± 115.8 


(11591/425.6) 




4068.6 ± 66.2 


(4204/135.4) 




rpSEC 

a— jet 


4353.3 ± 68.5 


(4494/140.7) 


1.56% 


1054.6 ± 33.3 


(1094/39.4) 


1.67% 


rpJPB 

a— jet 


5018.9 ± 98.9 


(5661/642.1) 


2.45% 


1265.2 ±41.1 


(1427/161.8) 


2.63% 


JjrpSEC 


1375.2 ± 37.6 


(1405/29.8) 




452.6 ± 21.6 


(465/12.4) 




DtJPB 


1627.8 ± 43.7 


(1754/126.2) 




546.4 ± 25.1 


(600/53.6) 





VII. TAGGING RATES IN THE SIMULATION 

Numbers of tags in simulated events which contain heavy flavor (h.f.), characterized by 
the prefix iJF, are listed in Table III. 

Different production mechanisms are separated by inspecting at generator level the flavor 
of the initial and final state partons involved in the hard scattering. We attribute to fiavor 
excitation the events in which at least one of the incoming partons has heavy flavor and to 
direct production the events in which the incoming partons have no heavy flavor and the 
outgoing partons both have heavy flavor. Pairs of heavy quarks which appear at the end of 
the evolution process are attributed to gluon sphtting. The flavor type of each simulated jet 
is determined by inspecting its hadron composition at generator level. 
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TABLE III. Number of jets before and after tagging in the inclusive lepton simulation (dir, 
f.exc and gsp indicate the direct production, flavor excitation and gluon splitting contributions). 
The row indicated as "h.f./light" lists the rates of away-jets with and without heavy flavors and 
highlights the properties of different production mechanisms. Data-to-simulation scale factors for 
the various tagging algorithms are not yet applied. 



Electron simulation 



±dg type 


6-clir 


c-dir 


6-f .GXC 


C-f .GXC 


6-gsp 


c-gsp 






Q47 


10779 


2786 




1 fiQO 


a— jet 


5848 


977 


11280 


2913 


6025 


1877 


h f /lip-ht 


^407/441 

I / i t A. 


899/78 


1605/9675 


367/2546 


7n7/^'^18 


14^/17'^9 


fjprpSEC 

-^l-jet 
fjpTJPB 

TTT^rpSEC 

a- jet 


-LOU 1 

2392 


163 


3624 
4531 

480 


194 
602 

68 


2106 

999 


1 zL7 

356 




2622 
678 


203 

5 


584 
157 


136 
4 


276 
78 


58 
1 


±1 r U 1 


iUoo 


/IS 

41:0 


303 


25 


IDO 


1 s 








Muon simulation 






Tag type 


6-dir 


C-dir 


6-f.exc 


c-f.exc 


6-gsp 


c-gsp 




1285 


298 


2539 


942 


1455 


747 


HFa~jet 


1358 


313 


2705 


994 


1708 


816 


h.f./light 


1206/152 


278/35 


422/2283 


124/870 


171/1537 


48/768 




569 


34 


1131 


83 


652 


92 


TT TTfrpJ PB 

tll:< 


707 
498 
627 


77 
29 
62 


1386 
132 
173 


229 
13 
34 


830 
54 
60 


202 
11 
21 


ffpj^j^SEC 


218 


3 


59 


2 


20 


1 


HFDT-'^^ 


347 


12 


105 


7 


50 


6 
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VIII. TUNING OF THE SM SIMULATION USING SECVTX AND JPB TAGS 



Following the procedure outlined in Sec. V, we fit the data with the heavy flavor simu- 
lation using rates of jets before and after tagging with the SECVTX and JPB algorithms. 
In the fit, we tune the cross sections of the different flavor production mechanisms. Starting 
from Table III the simulated rate of jets before tagging can be written as: 

HFi^i — Ki ■ {HFij^dir,i,i + bf ■ HFb-f.exc,i,i + bg ■ HFb^gsp^i^i) + 

Ki- {c- HFc-dir,l,i + Cf ■ HFc-f.exc,l,i + Cg ■ HFc-gsp,l,i) 

The rates of tagged jets are: 

HFTl, = Ki . SFiiHFTU,^,^^ + hf ■ i/FT^L^.^,,,, + hg ■ + 
Ki ■ SFiic ■ HFTl,„^,^, + cf ■ HFTlf,,,^,^, + cg ■ HFTi_^,^^^ 

and the rates of events with a double tag are: 

HFDr^ = K, . SFfiHFDTU,^, + hf ■ HFDTlf,^^^, + bg ■ HFDTl^^J + 
K, . SFfic ■ HFDTU^^, + cf ■ HFDTl^,^^^, + cg ■ HFDTl^^J 

where the index I indicates electron or muon data, i indicates the lepton- or the away-jet, and 
j indicates the type of tag (SECVTX or JPB). The fit parameters Ki account for the slightly 
different luminosity between data and simulation; they also include the normalization of the 
direct 6-production cross section. The factors c, c/, cg, hf and hg are fit parameters used 
to adjust the remaining cross sections calculated by HERWIG with respect to the direct hb 
production. The number of tags predicted by the simulation is obtained by multiplying the 
numbers in Table III by the appropriate scale factor. The fit parameters SF^ and SF^ are 
used to account for the uncertainties of the corresponding scale factors. The simulated rates 
HFTl- and HFDTl have statistical errors 5ip ^ ^ and S'^j^rp ^ . 

As mentioned at the end of Sec. V, the fraction of the data, which contains heavy flavor 
and is described by the simulation, is F^j = HFi^i^jet/Ni^jet- Therefore we fit the simulated 
rates to the quantities 
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HFDTi^(DATA) = DTi^ 

where Pqqcd,! the probabihty of finding a type-j tag due to heavy flavor in a-jets recoihng 
against a 1-jet without heavy flavor (see Table II). The errors e^- ; ^ of the rates ifF7]^j(DATA) 
include also the 10% uncertainty of PaQCD,i,i- 

Following the same procedure pioneered in Ref. [14], in which the HERWIG simulation 
was tuned to generic-jet data, we constrain the following fit parameters Xi to their measured 
or expected value Xi using the term 

{Xi - Xif 



Gi 



4. 



1. the ratio of the b and c direct production cross sections; it is constrained to the HERWiG 
default value with a 14% Gaussian error to account for the uncertainty of the parton 
fragmentation and for the fact that all quarks are treated as massless by the generator. 

2. the ratio of the 6 to c fiavor excitation cross sections; it is constrained to the HERWiG 
default value with a 28% error to account for the uncertainty of the parton structure 
functions. 

3. the correction bg to the rate of gluon splitting; g ^ bb is constrained to the value 
1.4 ± 0.19 returned by the fit to generic-jet data [14]. 

4. the correction eg to g ^ cc; it is constrained to the value 1.35 ± 0.36 returned by the 
fit to generic-jet data [14]. 

5. we constrain SFi, for SECVTX to unity with a 6% error. 

6. we constrain SFc to unity with a 28% error. 

7. we constrain SF^^^ and SF^^^ to unity with a 6% error. 
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In summary the fit minimizes the function 



= E E I E I {HF DT^ {BAT A) -HFDT/)^ . ^ 

i=e,/xi=tag-typc \i=jct-typc ^T,l,i ~^ ^T,l,i ^DT.l ^ilTd J *=1 



In total we fit 12 rates with 10 free parameters and 7 constraints. The best fit returns a 
value of 4.6 for 9 degrees of freedom. The values of the parameters returned by the fit and 
their correlation coefficients are shown in Tables IV and V. Tagging rates in the data and 
in the fitted simulation are listed in Table VI. 

As shown by Table IV, the correction factors to the parton-level cross sections predicted 
by HERWIG are close to unity. As also noted in Ref. [28], HERWIG predicts an inclusive 
6-quark cross section at the Tevatron which is approximately a factor of two larger than 
the NLO prediction [5] and is in fair agreement with the CDF and measurements. As 
shown in Table III, LO (labeled as direct production) and higher order (labeled as flavor 
excitation and gluon splitting) terms produce events with quite different kinematics. The 
LO contribution mostly consists of events which contain two jets with b (or c) flavor in 
the detector acceptance. In contrast, only a small fraction of the events due to higher 
order terms contains two jets with heavy flavor in the detector acceptance. Therefore, 
the observed ratio of tagged a-jets to tagged 1-jets constrains the relative weight of LO 
and higher order contributions. In the HERWIG simulation tuned to reproduce the data, 
the contribution of higher order terms is approximately a factor of three larger than the 
LO contribution. The NLO prediction, which uses normalization and factorization scales 
/^o = {PTb^ + nT'lV^'^, underestimates the heavy flavor cross section by a factor of two and 
also yields LO and NLO contributions of approximately the same size; the tuned parton- 
level prediction of HERWIG indicates that the data would be better described by a NLO 
calculation that uses the renormalization scale /x^ — 0.5 x {pTb^+fnlY^'^ and the factorization 
scale Hf ~ 0.1 x {prb^ + miy^'^. 

As shown by the comparison between data and tuned simulation in Table VI (rows 3 
to 6), the number of events containing two jets with heavy flavor, corresponding to cr^ft, 
is well modeled by the HERWIG generator in which, as shown in Table III, approximately 
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30% of the production is due to higher-than-LO terms. Therefore the NLO prediction of 
(Tj5 underestimates the data by 20%, whereas, as mentioned in the introduction, the NLO 
predictions of aj^i x BR and aj^i x BR^ underestimate the data by a much larger factor. 

TABLE IV. Result of the fit of the HERWIG simulation to the data. The fit is described in 
the text and yields x^/DOF = 4.6/9. The rescaling factors for the gluon splitting contributions 
predicted by the herwig parton-shower Monte Carlo are of the same size as those measured by the 
SLC and LEP experiments [29] , and are consistent with the estimated theoretical uncertainty [30] . 



SECVTX scale factor 


SFh 


0.97 ± 0.03 


SECVTX scale factor 




0.94 ± 0.22 


JPB scale factor 


SFjPB 


1.01 ±0.02 


e norm. 


Ke 


1.02 ± 0.05 


H norm. 




1.08 ± 0.06 


c dir. prod. 


c 


1.01 ±0.10 


b flav. exc. 


bf 


1.02 ±0.12 


c flav. exc. 


cf 


1.10 ±0.29 


g^bb 


bg 


1.40 ±0.18 


g ^ cc 


eg 


1.40 ± 0.34 
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TABLE V. Parameter correlation coefficients. 





DTq ijJ^JPB 




c 


h f 




hn 

og 


C5 




QTP, 
iJ-fb 


— U.U 1 U.I ±o 


n 7/17 

— U. / 4 / 






n 9Q7 


— U.UDZ 


U.UDD 


n 71 

— U. ( 10 




U.oOo 


— U.Zoo 


— u.uuz 


u.uoo 


n 1 /I7 


— U.U 1 i 


U.UoD 


— U.oUD 


iJ^JPB 




— U.OiU 


n ni n 


U.ODO 


n 1 07 


— u.uuy 


— u.u4y 


— U.oUZ 








— u.uyz 




— U.OUZ 


n n7i 

U.U / 1 


n 077 
U.U 1 / 


u.yoo 


C 








0.053 


0.020 


0.008 


0.002 


-0.098 


bf 










0.245 


-0.680 


-0.199 


-0.526 


cf 












-0.321 


-0.164 


-0.274 


bg 














-0.029 


-0.019 


eg 
















-0.018 



TABLE VL Rates of tags due to heavy flavor in the data and in the fitted HERWIG simulation. 
The heavy flavor purity of the lepton-jets in the data returned by the best fit is Fhf = (45.3ib 1.9)% 
in the electron sample and F^f = (59.7 it 3.6)% in the muon sample. The contribution of a-jets 
recoiling against 1-jets without heavy flavor has been subtracted; the 10% uncertainty of this 
contribution is included in the errors. 



Electrons Muons 



Tag type 


Data 


Simulation 


Data 


Simulation 


HPTS_ec 


10115.3 ± 101.7 


10156.? 


; ± 159.3 


3657.3 ± 60.8 


3636.7 ± 95.8 


HFTfPg 


11165.4 ± 115.8 


11139.J 


S ± 159.7 


4068.6 ± 66.2 


4059.7 ± 95.8 


HFT^% 


3729.0 ± 92.8 


3691.5 


. ± 109.7 


943.8 it 35.2 


967.4 ± 43.2 


HFT^rfet 


4035.8 ± 139.7 


3984.0 


i± 111.0 


1090.8 ± 44.9 


1059.3 ± 42.8 


HFDT^'^^ 


1375.2 ± 37.6 


1380.J 


; ± 59.4 


452.6 ±21.6 


474.3 ±31.1 


HFDT-^^^ 


1627.8 ± 43.7 


1644.0 ± 57.1 


546.4 ±25.1 


556.6 ± 28.7 
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A. Kinematics 

Because of the large flavor excitation contribution, the cross section evaluated with 
HERWIG depends strongly on the pseudo-rapidity and transverse momentum of the heavy 
quarks in the final state. The 2^2 hard scattering with p^^'" > 13 GeV/c used to generate 
simulated events does not cover some of the available phase space, such as the production 
of massive gluons with small transverse momentum, which then branch into pairs of heavy 
quarks. In addition, the detector simulation (QFl), which is based upon parametrizations 
of single particle kinematics, may not accurately model the jet-E'^ and trigger thresholds 
used in the analysis. It is therefore important to show that the simulation, which reproduces 
correctly the tagging rates and the away-jet multiplicity distribution, also models the event 
kinematics. Figures 1 to 4 compare transverse energy and pseudo-rapidity distributions in 
the data and in the the simulation, normalized according to the fit listed in Table IV ^. 

Figure 5 compares distributions of the azimuthal angle 54> between the lepton-jet and the 
away-jets. The region at 50 smaller than 1.2, which is well modeled by the tuned simulation, 
is mostly populated by the gluon splitting contribution. The good agreement between data 
and prediction supports the 40% increase of the gluon splitting cross sections (see Table IV). 

Figure 6 compares pseudo-lifetime distributions of SECVTX tags. The pseudo-hfetime 
is defined as 

pseudo-T = svx 

where L^y is the projection of the two-dimensional vector pointing from the primary vertex 
to the secondary vertex on the jet direction, and M^^-^ and pf.^^ are the invariant mass 



^The systematic discrepancy in the first bin of each Et distribution is the reflection of the slightly 
inaccurate modeling of the efficiency of the lepton trigger near the threshold. A few local dis- 
crepancies in some pseudo-rapidity distributions at I77I ~ and |7y| 2± 1 are due to an inaccurate 
modeling of the calorimetry cracks. These small discrepancies are not relevant in this analysis. 
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and the transverse momentum of all tracks forming the SECVTX tag. 

Distributions of M^^-^ and pf^^, which is sensitive to the heavy quark fragmentation, 
are shown in Figures 7 and 8. In Figures 7(a) and 8(a), the simulated distributions of 
SECVTX tags in lepton-jets are above the data near to the pr-threshold. This discrepancy 
follows from the fact that the tagging efficiency in the simulation is smaller than in the 
data and we take care of it with an overall multiplicative factor. This procedure does not 
account for the fact that the probabihty that a 8 GeV/c lepton is part of a tag is also higher 
in the data than in the simulation. In away-jets, where high-p^ tracks are not a selection 
prerequisite, there is better agreement between data and simulation. 

In conclusion, our simulation calibrated within the theoretical and experimental uncer- 
tainties models correctly the heavy flavor production at the Tevatron. 
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FIG. 1. Distributions of transverse energy, Et, or momentum, pT, for lepton-jets tagged by 
SECVTX. (a): electrons; (b): electron-jets; (c): muons; (d): muon-jets. Jet energies are corrected 
for detector effects and out-of-cone losses. 
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FIG. 2. Pseudo-rapidity distributions of electron (a) and muon (b) jets tagged by SECVTX. 



30 



electron sample 




50 100 150 200 -2 -1 1 

ET(GeV) r|. 




50 100 150 200 -2 -1 1 2 

E^CGeV) ri. 



FIG. 3. Away-jet distributions in events where the electron-jet is tagged by SECVTX. (a): a-jet 
transverse energy; (b): a-jet pseudo-rapidity; (c): transverse energy of a-jets tagged by SECVTX; 
(d): pseudo-rapidity of a-jets tagged by SECVTX. Jet energies are corrected for detector effects 
and out-of-cone losses. 
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FIG. 4. Away-jet distributions in events where the muon-jet is tagged by SECVTX. (a): a-jet 
transverse energy; (b): a-jet pseudo-rapidity; (c): transverse energy of a-jets tagged by SECVTX; 
(d): pseudo-rapidity of a-jets tagged by SECVTX. Jet energies are corrected for detector effects 
and out-of-cone losses. 
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FIG. 5. Distribution of the azimuthal angle 6(j) between lepton-jets tagged by SECVTX and 
away-jets in the same event. 
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FIG. 6. Pseudo-r distributions of electron-jets (a) and muon-jets (b) tagged by SECVTX and 
for tagged away-jets in events where the electron-jet (c) or the muon-jet (d) is also tagged. 
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FIG. 7. Distributions of the transverse momentum (a) and invariant mass (b) of SECVTX tags 
in electron-jets; (c) and (d) are analogous distributions for away-jets in events in which the e-jet is 
also tagged. 
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FIG. 8. Distributions of the transverse momentum (a) and invariant mass (b) of SECVTX tags 
in muon-jets; (c) and (d) are analogous distributions for away-jets in events in which the muon-jet 
is also tagged. 
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IX. RATES OF SLT TAGS 



Following the strategy outlined in Sec. II, we search away-jets for soft leptons (e or //) 
with pr > 2 GeV/c and contained in a cone of radius 0.4 around the jet axis. We then 
compare rates of away-jets containing soft lepton tags due to heavy flavor in the data and 
in the simulation tuned as in Table IV. Table VII lists the following rates of away-jets with 
SLT tags: 

1. T^_fj^t, the number of away-jets with a soft lepton tag. 

2. T^I^fef^^ (T^^je/^^), the number of away-jets with an SLT tag and a SECVTX (JPB) 
tag (called supertag in Ref. [1]). 

The uncertainty on the number of tags due to heavy flavor in Table VII includes the 10% 
error of the mistag removal. In events in which the lepton-jet does not contain heavy flavor, 
the number of away-jets with an SLT tag due to heavy flavor is predicted using the average 
probability Pgqcd- This average probability is estimated by weighting all the away-jets with 
the parametrized probability Plf, derived in generic-jet data and described in Sec. IV. In 
these events, the uncertainty of the average probabihty of flnding a real or a fake SLT tags 
is estimated to be no larger than 10%. We cross-check the estimate of these uncertainties 
in Sec. X. 

Rates of SLT tags in the simulation before tuning are shown in Table VIII. The un- 
certainty of the SLT efficiency is estimated to be 10% and includes the uncertainty of the 
semileptonic branching ratios [25,26]. The numbers of supertags predicted by the simulation 
are obtained by multiplying the numbers in Table VIII by the scale factor 0.85 ± 0.05. 

Following the notations of Section VIII, rates of tagged away-jets with heavy flavor in 
the fitted simulation are defined as: 

^Ki.{c- HFTl%^^_.^, + cf . HFTl%^^^,^^_.^, + eg ■ H FTlf^^^,^^_.J and 
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Ki . SFi{c ■ i/FTf_^Xi„_,et + cf ■ HFT^!:j2c,i,a-,et + eg ■ H FTlf,^^,^,_^J 

where HFTf^^^^^ is the rate of a-jets containing heavy flavor tagged by the SLT algorithm, 
and HFTi^^Jjif is the rate of a-jets containing heavy flavor with a supertag j (SECVTX 
or JPB). The errors on the simulated rates include the statistical error, the systematic 
uncertainty for finding SLT tags and supertags, and the uncertainties of the parameters 
{Ki, hf , bg, c, c/, eg, and SF) hsted in Table IV and VI. In the data the analogous rates 
are: 

HFTifJ^,,{DATA) = T,^„^_^.,, - 7V,,„_ . (1 - F^^) . P^lt^^^ and 
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TABLE VII. Number of away-jets with SLT tags due to heavy flavors in the inclusive lepton 
sample. Raw counts and removed mistags are listed in parentheses. When appropriate, mistags 
include fake SECVTX (JPB) contributions. Pgqcd is the probability of finding a tag due to heavy 
flavor in away-jets recoiling against a lepton-jet without heavy flavor. 



Electron data Muon data 

Tag type Pgqcd Pgqcd 

Ta-jet 1063.8 ± 113.0 (2097/1033.2) 0.49% 308.6 ± 34.7 (562/253.4) 0.54% 

T^-Jef^^ 356.3 ±22.8 (444/87.7) 0.08% 69.3 ± 9.9 (92/22.7) 0.09% 

T^-Jet^^ 401.3 ±25.3 (513/111.7) 0.13% 112.3 ± 12.3 (143/30.7) 0.14% 



TABLE VIII. Rates of away-jets with SLT tag due to heavy flavors in the inclusive lepton 
simulation. The data-to-simulation scale factor for the supertag efflciency is not yet applied. 



Electron simulation 

Tag type 6-dir c-dir 6-f.exc c-f.exc ^-gsp c-gsp 

HFT^^[^t 362 26 93 30 41 9 

HFT^^^^fEC 1 47 2 18 

HFT^^^JJP^ 200 7 53 6 21 2 

Muon simulation 

HFT^^^T^ 82 10 21 5 9 5 

HFT^I^IJ^^ 33 2 9 4 

HFT^I^JJP^ 44 3 13 3 5 2 
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A. Rates of soft leptons due to heavy flavor in the data and in the tuned simulation 

The comparison of the yields of away-jets with SLT tags due to heavy flavor in the 
data and in the tuned simulation is shown in Table IX. Table X lists the numbers of 
tags in the tuned simulation split by flavor type and production mechanism, and Table XI 
summarizes the different contributions to the observed number of tags. In the data there 
are HF^^J^^ = 1138 ± 140 a-jets with a soft lepton tag due to heavy flavor. The ±140 error 
is dominated by the 10% systematic uncertainty of the fake and generic-QCD contributions 
to SLT tags; the statistical error is ±51 jets. The simulation predicts 747 ± 75 a-jets with 
soft lepton tags due to hh and cc production (most of the error is systematic and due to 
the 10% uncertainty on the SLT tagging efficiency). The discrepancy is a 2.5 a systematic 
effect. 

The comparison of the yields of supertags in the data and in the tuned simulation is also 
listed in Table XI. The subset of data, in which a-jets have both SLT and JPB tags due to 
heavy flavor, contains 453 ±29 supertags (in this case the ±25 statistical error is larger than 
the ±15 systematic error due to the fake-tag subtraction). The simulation predicts 317 ±25 
a-jets with a supertag due to hb and cc production. The ±25 systematic error is obtained 
combining in quadrature the uncertainty of the SLT efficiency (±16) with the uncertainty 
(±20) due to the fit in Table IV and to the simulation statistical error. This discrepancy is 
a 3.5 cr effect dominated by systematic uncertainties. In the even smaller subset of events, 
in which a-jets contain both SECVTX and SLT tags due to heavy flavor, the discrepancy 
between data and simulation is a 2.4 cr effect, also dominated by the same systematic errors. 

There is no gain in combining the three results because the uncertainties on the number 
of a-jets with SLT tags due to heavy flavor, before and after tagging with the SECVTX 
and JPB algorithms, are highly correlated. Away-jets with supertags are a subset of the 
a-jets with SLT tags, and there is overlap between the subsets with JPB and SECVTX 
supertags. However, it is important to note that the discrepancy between observed and 
expected number of SLT tags is of the same size before and after tagging with the SECVTX 
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and JPB algorithms. This disfavors the possibihty that the disagreement between data and 
simulation arises from jets containing hadrons with a lifetime much shorter than that of 
conventional heavy flavor. 

We have considered the impact on the number of expected supertags due to the 0.85 ± 
0.05 scale factor derived in generic-jet data. If we had evaluated the number of simulated 
supertags using the product of simulated efficiencies of the SECVTX (JPB) algorithm and 
of the SLT algorithm, which has a 10% uncertainty, the discrepancy between data and 
simulation would be smaller: 1.6 a and 1.0 a for a-jets with JPB and SECVTX tags, 
respectively. However, analogous rates of tags in generic-jet data would be approximately 
1.5 cr lower than in the simulation. Figure 9 shows the yield of the ratio of the number of 
supertags (SECVTX-FSLT) to that of SECVTX tags produced by heavy flavor, in generic 
jets and in the away-jets recoiling against a lepton-jet. The ratio E! is derived in analogy 
replacing SECVTX with JPB tags. The comparison of these ratios in the generic-jet data 
and their simulation has been used in Ref. [1] to calibrate the efficiency for finding supertags 
in the simulation. In Figure 9, the efficiency for finding supertags in the simulation has not 
been corrected with the 0.85 ± 0.05 scale factor. For the simulation, the plotted errors of 
R (R') account for the uncertainty of the relative contribution of b and c quarks, but not 
for the uncertainty of the supertag efficiency, which is no smaller than 10%. One notes that 
the simulation predicts the same value of R {R') for generic jets and away-jets in lepton- 
triggered events, whereas, in the data, the value of R {R') for away-jets is approximately 
20% higher than for generic jets. 

Finally, we have investigated the dependence of the predicted yield of away-jets with 
SLT tags on the ratio of the cc to bb productions predicted by the simulation. To a good 
approximation, the predicted yield does not depend on the tuning of the simulation. Since 
the ratio of the tagging efficiency for c jets to that for b jets is approximately equal for the 
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JPB and SLT algorithms ^, the expected number of away-jets with SLT tags is 

TT rprpSLT _ ^SLT ^ f at j_ ^SLT / ,SLT ^ at \ — ^SLT / ,JPB ^ JPB ^ f at J- ^JPB /,JPB ^ at \ 

= 4'''^ 14'''' X HFTl^Jl,{DATA) = ^/e^^^^ x (5126.6 ± 146.7) = 763 ± 80 

and docs not depend on the size of Nb and the numbers of away-jets attributed by the fit 
to bottom and charmed flavor, respectively. As an example of this, without constraining the 
ratio of the c to 6 direct productions to the nominal value within a 14% error, we have misled 
the fit to return a very different, and not correct, local minimum (c = 2.8 ± 1.6 instead of 
c = 1.01 ± 0.10 in Table IV). The number of a-jets with SLT tags remains approximately 
constant (in the electron sample, 598 ± 69 becomes 603 ± 66; in the muon sample, 149 ± 21 
becomes 156 ±21). 

TABLE IX. Number of a-jets with an SLT tag due to heavy flavor decay. The contribution of 
a-jets recoiling against I-jets without heavy flavor has been subtracted (see text). 

Electrons Muons 

Tag type Data Simulation Data Simulation 

HFT^I'l^t 865.1 ± 114.8 597.6 ± 69.3 272.7 ±34.9 149.3 ± 21.0 

HFT^^^Tjf^'^ 322.6 ±23.3 242.4 ±22.5 63.3 ± 9.9 53.8 ± 8.7 

HFT^!;jy^'^ 350.2 ± 26.3 251.5 ±21.7 103.2 ± 12.4 65.0 ± 8.9 



''The average tagging efficiencies in this data set are e^^^ = 0.43, e^^^ = 0.30, ef^^ = 0.064, 



and ef^^ = 0.046. 
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TABLE X. Tagging rates in the normalized simulation listed by production mechanisms. 



Electron simulation 



Tag type 


6-dir 


c-dir 


6-f.exc 


c-f.exc 


b-gsp 


c-gsp 




5781.0 ± 320.8 


973.2 ± 109.8 


11247.8 ± 1073.9 


3115.7 ±790.1 


7504.6 ± 1081.6 


2411.0 ± 593.8 


HFa-jet 


5961.4 ± 330.6 


1004.0 ± 113.2 


11770.6 ±1123.6 


3257.7 ± 826.0 


8591.1 ± 1237.4 


2677.8 ± 659.2 


I- jet 


2267.5 ± 101.6 


49.1 ± 19.4 


4505.5 ± 451.7 


199.5 ±81.7 


2942.4 ± 408.7 


192.8 ± 87.8 


I— jet 


2358.3 ± 99.0 


162.0 ±20.7 


4572.8 ± 454.2 


651.1 ±167.3 


2904.3 ± 404.4 


491.2 ±122.2 


-^a-jet 

HFTJPB 

a— jet 

fJPJjrpSEC 


2542.0 ± 112.3 

2585.1 ± 107.3 

981.1 ±52.5 


86.0 ±33.1 
201.8 ± 24.8 

4.3 ±3.6 


596.8 ± 65.0 

589.4 ± 62.8 

232.5 ±31.4 


69.9 ±29.4 
147.1 ±39.4 

3.8 ± 3.3 


377.1 ±57.5 
380.6 ± 57.1 
157.9 ±27.8 


19.7 ± 10.2 
80.0 ±22.1 
1.2 ± 1.5 


HFDTJPB 


1032.7 ±45.8 


41.3 ±7.5 


295.7 ±36.0 


26.2 ± 8.5 


224.1 ± 35.0 


24.0 ±8.1 


HFTSlt 

a~jet 


369.0 ± 46.2 


26.7 ±6.6 


97.0 ±16.7 


33.6 ± 11.0 


58.5 ± 13.7 


12.8 ±5.5 


jjp'j^'SLTSEC 
a— jet 


164.1 ± 17.4 


0.8 ±0.9 


49.7 ± 9.2 


1.7 ±1.4 


26.0 ±7.2 





ffPT'SLTJPB 
-^a-jet 


167.6 ± 16.6 


5.9 ±2.3 


45.5 ±8.1 5.5 ±2.7 
Muon simulation 


24.6 ± 6.5 


2.3 ± 1.8 


Tag type 


6-dir 


c-dir 


6-f.exc 


c-f.exc 


&-gsp 


c-gsp 


HFi_jgt 


1383.7 ± 84.4 


323.5 ± 39.6 


2798.6 ± 292.4 


1112.8 ± 285.4 


2191.5 ±310.5 


1125.7 ± 284.9 


HFa-jet 


1462.3 ±88.7 


339.8 ±41.4 


2981.5 ±311.2 


1174.2 ± 301.0 


2572.5 ± 363.5 


1229.7 ±310.9 


I— jet 


730.0 ±42.3 


33.9 ± 14.0 


1485.2 ± 164.3 


90.1 ± 38.0 


1170.0 ± 161.8 


127.5 ± 59.3 


TJ rr-ri JPB 

HI' I, , 

I- jet 


736.3 ± 39.0 


80.8 ± 12.3 


1477.5 ± 160.9 


261.6 ± 69.0 


1209.1 ± 166.3 


294.4 ± 76.0 


^a-jet 


638.9 ± 38.4 


28.9 ±12.1 


173.3 ± 23.8 


14.1 ±7.0 


96.9 ± 18.4 


15.2 ±8.3 


HPT JPB 

-"^ ^a-jet 


653.0 ±35.6 


65.1 ± 10.5 


184.4 ± 24.0 


38.8 ± 11.9 


87.4 ± 16.2 


30.6 ± 10.1 


HPUTSEC 


333.2 ± 26.2 


2.8 ±2.5 


92.3 ± 16.1 


2.0 ±2.0 


42.8 ±11.1 


1.3 ± 1.6 


HFDT-'PB 


349.5 ± 22.0 


12.2 ±3.7 


108.3 ± 16.3 


7.7 ±3.5 


70.4 ± 13.6 


8.5 ±4.0 


HFTSlt 

a~ jet 


88.3 ±14.0 


10.9 ±3.8 


23.1 ±6.0 


5.9 ±3.1 


13.6 ± 5.1 


7.5 ±3.9 


ffrprpSLTSEC 
^a-jet 


36.0 ±6.8 


1.7± 1.4 


10.0 ±3.6 





6.1 ±3.2 





fjprpSLTJPB 
a— jet 


38.9 ±6.5 


2.7 ± 1.6 


11.8 ±3.6 


2.9 ± 1.8 


6.2 ±2.9 


2.5 ± 1.9 
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TABLE XL Summary of the observed and predicted numbers of a-jets with SLT tags or su- 
pertags in the inclusive lepton sample. Mistags are the expected fake-tag contributions in a-jets 
recoiling against 1-jets with heavy flavor (h.f.). QCD are the predicted numbers of tags, which 
include mistags, in a-jets recoiling 1-jets without heavy flavor. HFT^et (data and h.f. simulation) 
are the numbers of tagged a-jets with heavy flavor recoiling against 1-jets with heavy flavor; in the 
data, this contribution is obtained by subtracting the second plus third rows of this table from the 
first one. 



Tag type SLT SLT+SECVTX SLT+JPB 

Observed 2659 536 656 

Mistag 619 ±62 53 ± 5 69 ±7 

QCD 902 ±91 97 ±10 134 ± 13 

i?Fr„_jet (data) 1138 ±140 386 ± 26 453 ±29 

HFTa-jet (h.f.simulation) 747 ± 75 296 ± 26 317 ± 25 

Excess 391 ± 159 90 ± 37 136 ± 38 



44 



-B- 



-B- 

J I I I I I I I I 

0.06 0.08 0.1 

FIG. 9. Yield of R, the ratio of the number of jets with a SECVTX and SLT tag to that with 
a SECVTX tag in the data (square) and the corresponding simulations (open square). R is the 
analogous ratio for JPB tags. The error in the simulation comes from the uncertainty of relative 
ratio of bottom and charmed hadron in the data; this uncertainty results from the tuning of the 
heavy flavor cross sections predicted by HERWIG to model the rates of SECVTX and JPB tags 
observed in the data. The simulation is not corrected for the scale factor 0.85 ib 0.05 which is used 
to equalize data and prediction in generic jets. 



R (jet data) 
R (jet sim) 
r' (jet data) 
r' (jet sim) 
R (a-jet data) 
R (a-jet sim) 
r' (a-jet data) 
r' (a-jet sim) 
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X. SYSTEMATICS 



This section reviews and verifies systematic effects that could reduce the discrepancy 
between observed and predicted numbers of away-jets with a soft lepton tag due to heavy 
flavor. The discrepancy depends on the estimate of the mistag rate in the data and on 
the simulated efficiency of the SLT algorithm, and also on the size of the bb contribution 
in the simulation. We cross-check these estimates in subsections A and B, respectively. In 
subsection C, we verify the discrepancy between data and simulation found in this study 
with a sample of jets that recoil against J/ip mesons arising from B decays. 

A. Fake SLT tags and the simulated SLT efficiency 

Table XI shows an excess of 391 away-jets with SLT tags due to heavy flavor with respect 
to the number, 747 ± 75, predicted by the heavy flavor simulation. In the data, we have 
removed a fake contribution of 619 ± 62 SLT tags ^. If the estimate of the fake rate could 
be increased by 60% (6 times the estimated uncertainty), this excess would disappear. The 
simulated efficiency of the SLT algorithm has been tuned using the data and we estimate 
its uncertainty to be 10%; however, if the simulated efficiency could be increased by 50%, 
the disagreement between data and simulation would also disappear. 

Table XI also shows an excess of 137 a-jets with SLT-I-JPB supertags due to heavy flavor 
with respect to the number 316 ± 25 predicted by the simulation. In the data, we have 
removed 142 ± 14 fake tags; in this case, one would need to increase the fake-rate estimate 
by 10 cr in order to cancel the excess in the data. The simulated supertag efficiency has 



^In the data, we have also subtracted the generic-jet contribution of SLT tags due to a-jets recoiUng 
against 1-jets without heavy flavor (see Table XI). This contribution is slightly overestimated 

because the tagging probability P-'^* has been constructed using also events in which both jets 
contain heavy flavor. 
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been calibrated with generic-jet data to a 6% accuracy; in order to cancel the discrepancy, 
the supertag efficiency in the simulation should be increased by 8.7 u. 

We verify the uncertainty of the fake rate and heavy flavor contributions by comparing 
rates of SLT tags in three generic-jet samples to their corresponding simulations fitted to 
the data using rates of SECVTX and JPB tags. These rates of tags, together with the 
fake contributions evaluated with the same fake parametrizations used in the present study, 
are fisted in Table XII, which is derived from the study presented in Ref. [1]. A summary 
of Table XII is presented in Table XIII. The observed number of SLT tags in generic jets 
(sample A in Table XIII) is dominated by the fake contribution, and we use the difference 
between the observed number of SLT tags and the number of SLT tags due to heavy fiavor 
predicted by the simulation to reduce the uncertainty of the fake rate. Generic-jet data 
contain 18885 SLT tags. The parametrized probability predicts 15570 ± 1557 fake tags. The 
simulation predicts 3102 SLT tags due to heavy fiavor with a 13% uncertainty (dominated 
by the 10% uncertainty of the SLT tagging efficiency). By removing from the data the heavy 
flavor contribution predicted by the simulation, one derives an independent and consistent 
estimate for the fake contribution of 15783 ± 403 SLT tags. The latter determination of the 
fake contribution has a 2.6% uncertainty. 

Before tagging with the SLT algorithm, away-jets in the inclusive lepton sample have a 
larger heavy fiavor content (~ 26%) than that of sample A in Table XIII (~ 13%). However, 
generic jets tagged by SECVTX and JPB algorithms (samples B and C, respectively) have 
a heavy-flavor purity of 78% and 58%, respectively. Because these latter samples have a 
larger heavy fiavor content, the discrepancy between the observed and predicted yields of 
away-jets with SLT tags observed in the present study cannot arise from deficiencies of the 
heavy flavor simulation or from an increase of the fake probability in jets with heavy favor. 

In addition, the total number of SLT tags observed in generic jets can be used to achieve 
a better determination of the sum of the predicted numbers of fake SLT tags plus SLT tags 
due to heavy flavor (h.f.) with respect to that presented in Sec. IX A. To obtain this, we 
fit the observed rate of SLT tags in both samples A and C with the predicted number of 
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fake and h.f. tags weighted with unknown parameters Pf and Ph.f., respectively. The data 
constrain the parameter values to be Pf — 1.017 ± 0.013 and Pfij_ — 0.981 ± 0.045 with a 
correlation coefficient p = —0.77. 

After having removed the contribution of events in which the lepton-jet does not contain 
heavy flavor, away-jets contain 1757±104 SLT tags; in Sec. IX A, this number was compared 
to a prediction of 619 ± 62 fake and 747 ± 75 h.f. tags. When using the weights, errors and 
parameter correlation derived using generic jets, the prediction of the total number of SLT 
tags becomes 1362 ± 28. The systematic uncertainty of the prediction is reduced by a factor 
of 2.8 with respect to that presented in Sec. IX A, while the disagreement remains the same. 
In conclusion, the discrepancy observed in this study cannot arise from obvious deflciencies 
of the prediction. 
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TABLE XII. Number of tags due to heavy flavors in three samples of generic jets [31] and in 
their tuned simulation. The amount of mistags removed from the data is indicated in parenthesis; 
errors include a 10% uncertainty in the mistag evaluation. The yields of tags in the simulation 
have been corrected with the appropriate scale factors (see Sec. IV). The error of the number of 
simulated SLT tags includes the 10% uncertainty of the SLT tagging efficiency in the simulation; the 
simulation efficiency for finding supertags (SLT+ SECVTX and SLT+ JPB) has been empirically 
reduced by 15% to reproduce generic-jet data with a 6% accuracy. 

JET 20 (194,009 events) 

Tag type Data (removed fakes) Simulation 



SECVTX 


4058 ± 92 (616.0) 


4052 + 143 


JPB 


5542 ± 295 (2801.0) 


5573 + 173 


SLT 


1032 ± 402 (3962.0) 


826 + 122 


SLT+SECVTX 


219.8 ± 20 (94.2) 


223 + 16 


SLT+JPB 


287.3 ± 28 (166.7) 


280 + 19 




JET 50 (151,270 events) 




Tag type 


Data (removed fakes) 


Simulation 


SECVTX 


5176 ± 158 (1360.0) 


5314 + 142 


JPB 


6833 ± 482 (4700.0) 


6740 + 171 


SLT 


1167 + 530 (5241.0) 


1116 + 111 


SLT+SECVTX 


347 ± 29 (169.0) 


343 + 23 


SLT+JPB 


427.5 ± 42 (288.5) 


416 + 27 




JET 100 (129,434 events) 




Tag type 


Data (removed fakes) 


Simulation 


SECVTX 


5455 ± 239 (2227.0) 


5889 + 176 


JPB 


6871 ± 659 (6494.0) 


7263 + 202 


SLT 


1116 ± 642 (6367.0) 


1160 + 168 


SLT+SECVTX 


377.6+ 36 (243.4) 


432 + 29 


SLT+JPB 


451.8+ 55 (401.2) 


478 + 32 
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TABLE XIII. Number of SLT tags in all generic-jets listed in Table XII (sample A) and in 
away-jets recoiling a lepton-jet with heavy flavor (sample D). Samples B and C are generic jets 
tagged with the SECVTX and JPB algorithms, respectively. Before tagging with the SLT algo- 
rithm, the heavy flavor purity is 13% for sample A, 78% for sample B, 58% for sample C, and 26% 
for the sample D used in this study. The prediction of the fake SLT rate is calculated with the 
same parametrized probability for all samples; the heavy flavor (h.f.) contributions are predicted 
with the same simulation. 

Sample Number of SLT tags Predicted fakes Predicted h.f. 



A: JET 20+ JET 50+ JET 100 



18885 



15570 ± 1557 



3102 ± 403 



B: generic jets with SECVTX tags 



1451 



507 ± 51 



998 ± 60 



C: generic jets with JPB tags 



2023 



856 ± 86 



1174 ± 71 



D: away-jets 



1757 



619 ± 62 



747 ± 75 
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We have investigated the possibihty that the rate of fake SLT tags might be higher in 
jets with heavy flavor than in jets due to hght partons. The correlation between the fake 
and h.f. predictions, estabhshed by the previous comparison between the total number of 
observed and predicted tags in generic jets, would require that an increase of the fake rate 
is compensated by a smaller efficiency of the SLT algorithm in the simulation, and it would 
not reduce the disagreement between data and prediction observed in the inclusive lepton 
sample. However, it is of interest to show this study in anticipation of the next subsection. 

The parametrization of the SLT fake rate has been derived in generic-jet data without 
distinguishing between muons faked by hadrons not contained by the calorimeter and muons 
produced by in-flight decays of tt and K mesons. The second contribution is beheved to be 
small because the reconstruction algorithms reject tracks which exhibit large kinks, but this 
has never been carefully checked. Away-jets in the inclusive lepton sample have a larger 
heavy flavor content (~ 26%) than the generic jets used to determine the SLT fake rate 
(~ 13%), and possibly a larger kaon content. Since kaons have a shorter lifetime than pions, 
in-flight decays of kaons could increase the SLT fake rate in the inclusive lepton sample with 
respect to generic-jet data. We verify the contribution of kaon in-flight-decays by using a 
combination of data and simulation. First we extend the simulation of the SLT algorithm 
to match tracks not only to leptons originating from heavy quark decays at generation level 
but also to muons originating from kaon decays at detector simulation level. With this 
implementation, the rate of SLT tags in the simulation increases by only 1% (from 746.9 to 
754.4 tags). 

We check the simulation result within a factor of two by selecting D° Ktt decays 
in the data and in the tuned simulation. As done in previous analyses [32], we search the 
inclusive lepton sample for — > K~7r~^ decays near the trigger leptons. To increase the 
sample statistics we do not require that leptons are contained in a jet with transverse energy 
larger than 15 GeV. The D° — > K~n'^ decays are reconstructed as follows. We select events 
in which a cone of radius 0.6 around the lepton direction contains only two SVX tracks 
with opposite charge, pr > 1.0 GeV/c, and an impact parameter signiflcance larger than 
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two ^. We reconstruct the two-track invariant mass attributing the kaon mass to the track 
with the same charge as the lepton as is the case in semileptonic S-decays. The resulting 
K~TT^ invariant mass spectrum is shown in Figure 10 together with a polynomial fit to the 
background which ignores the mass region between 1.7 and 2.0 GeV/c^. According to the 
fit, in the mass range 1.82 — 1.92 GeV/c^ the simulation contains 563 mesons on top of a 
background of 95 events (the corresponding 563 kaons are also identified at generator level). 
We find that one kaon in 563 D° decays produces a soft muon tag, which corresponds to 
0.0018 SLT tags per kaon. 

The data contain 1117 K'ti^ pairs in the mass range 1.82 — 1.92 GeV/c^ (891 are 
attributed by the fit to D° mesons and 226 to the background). The 1117 kaon tracks 
produce 6 SLT tags. The contribution of the background is estimated from the side-bands 
(1.64 - 1.74 and 2.0 - 2.1 GeV/c^) to be 3.8 ± 1.0 events. It follows that 891 kaons from 
D° decays produce 2.2 ± 2.6 SLT tags. The fraction of SLT tags per kaon, 0.0024 ± 0.0029, 
includes the fake-tag contribution, and is consistent with the small fraction predicted by 
the simulation. We conclude that in-flight decays of K mesons are a negligible background 
contribution. 



^The impact parameter is the distance of closest approach to the primary vertex in the transverse 
plane. 
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FIG. 10. Distributions of the Ktt invariant mass, M. The sohd hne is a polynomial fit to the 
distributions excluding the window between 1.7 and 2.0 GeV/c^. 



B. b purity of the data sample 



The discrepancy between observed and predicted number of a-jets with SLT tags due to 
heavy flavor would be reduced if the bb contribution was underestimated by the simulation. 
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In this section, we verify that the bb contribution is predicted correctly. As shown in Table X, 
the inclusive electron simulation predicts that 79% of the away-jets with heavy flavor are 
due to bb production. This table also shows that the fraction of away-jets with an SLT tag 
is higher in events due to bb production (2%) than in events due to cc production (1%). If 
one had a reason to increase the b purity in the simulation from 79% to 100%, one could 
increase the predicted number of a-jets with a SLT tag in Table IX from 598 to 756, which is 
closer to the 865 ±115 a-jets with a SLT tag due to heavy flavor in the data. We provide an 
independent check of the b purity of the inclusive lepton sample by comparing the number 
of D^, D"^, and J/ip mesons from 5-decays which are contained in lepton-jets in the data 
and in the normalized simulation. 

1. l~D^ and l~^D~ candidates 

We identify l~D^ candidates searching for — > K~tt~^ decays inside the lepton-jet, as 
explained in the previous section. In a similar way, we identify pairs searching for 

D" K'^t:~t:~ decays inside the lepton-jet. In this case, we select jets containing one 
positive and two negative tracks with px > 0.6 GeV/c and impact parameter significance 
larger than 2.5 in a cone of radius 0.6 around its axis. When reconstructing the three-track 
invariant mass, we attribute the kaon mass to the track with the same charge as the lepton 
as is the case in semileptonic B decays. 

Figure 11 shows the invariant mass distributions of D° and candidates found in the 
data and in the fitted simulation. By comparing with Figure 10, one notes that the mass 
resolution is degraded when using tracks inside a jet and is degraded slightly differently in 
the data and in the simulation. 

There are 83510 lepton-jets in the data with an estimated heavy flavor purity Fhf — 
(47.9 ± 2.0)%. The simulation normalized according to Table IV contains 39989 lepton- 
jets with heavy flavor. In the mass range 1.82 — 1.92 GeV/c^, we find 205 D° candidates 
in the data and 195.5 D° candidates in the simulation. By fitting the side-bands with a 
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polynomial function (solid line in Figure 11), we evaluate a background of 79.6 ± 6.0 events 
in the data and of 55.6 ± 5.5 events in the simulation. After background subtraction, there 
are 126.0 ± 15.5 mesons in the data and 139.9 ± 15.0 mesons in the simulation. 

In the mass range 1.82 — 1.92 GeV/c^, there are 216 candidates in the data and 
159.2 in the simulation. By fitting the side-bands with a polynomial function we estimate 
a background of 142.3 ± 10.0 events in the data and of 90.7 ± 6.4 events in the simulation. 
After background subtraction we find 73.7 ± 17.8 mesons in the data and 68.5 ± 14.1 in 
the simulation. Prom the ratio of the numbers of ID candidates, we derive that the ratio of 
the bb production in the simulation to that in the data is 1.09 ± 0.15. 

2. J/'^ candidates 

We look for J/tfj candidates by searching the electron- or muon-jet for additional soft 
lepton tags with the same flavor and opposite charge. Dileptons with invariant mass 2.6 < 
rriee < 3.6 GeV/c^ and 2.9 < m^^ < 3.3 GeV/c^ are considered J/V' candidates (Dil^). 
Dil^^'^ and Dil"^^^ are the numbers of J/ijj candidates in lepton-jets tagged by SECVTX 
and JPB, respectively. We use the number of SS dileptons with a 10% error to estimate and 
remove the background to OS dileptons due to misidentified leptons [33] . 

Figure 12 compares invariant mass distributions of same flavor dileptons including J/ip 
mesons in the data and in the simulation (in the simulation J/ip mesons are only produced 
by B decays) . Rates of J/ mesons in the data and in the normalized simulation are listed in 
Table XIV. One notes that the simulation contains a number of J/t/j mesons in jets tagged 
by SECVTX or JPB which is slightly higher than, but consistent with the data. Before 
tagging, the rate of J/t/j mesons in the data is 20% larger than in the simulation, whereas 
it was expected to be larger by a factor of two according to the CDF measurement of the 
fraction of J/ip^s coming from 5-decays [34]. This would happen if the bb cross section had 
been overestimated in normalizing the simulation. 

After combining the ratio of ID candidates in the data to that in the simulation with the 
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ratio of IJ/il^ candidates with a JPB tag listed in Table XIV, we estimate that the ratio of 
the hh production in the simulation to that in the data is 1.09 ±0.11. This ratio is consistent 
with unity, and does not support the possibility that the b purity in the fitted simulation is 
underestimated by 21%. 

TABLE XIV. Number of J/ip mesons identified in the data and in the fitted simulation. 



Tag type 



Electrons 

Data Simulation 
176.0 ±14.4 155.2 ±21.5 

57.8 ±8.8 71.8 ±10.7 

61.2 ±8.4 68.9 ±9.4 



Muons 

Data Simulation 

83.0 ±9.4 54.0 ±10.1 

31.9 ±5.8 28.7 ±6.2 

29.6 ±5.7 33.0 ±6.4 
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FIG. 11. Invariant mass distributions of D'^ candidates in the data (a) and in the simulation (b) 
and of candidates in the data (c) and in the simulation (d). The solid line is a polynomial fit 
to the mass distributions excluding the region 1.75 — 2.0 GeV/c^. 
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FIG. 12. Distributions of the invariant mass of same flavor dileptons inside the same jet before 
(a) and after tagging with SECVTX (b) and JPB (c). 



C. J/t/j ^fi data 

As shown in Table X, away-jets witli a supertag are mostly due to bb production as it is 
the case for generic jets with a supertag. However, we see a discrepancy between observed 
and predicted number of supertags after having calibrated the supertag efficiency in the 
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simulation by using generic jets. Since this is suggestive that the excess of SLT tags in 
the away-jets is related to the request that a jet contains a presumed semileptonic 6-decay 
(lepton-jet), we study a complementary data sample enriched in bb production but not in 
semileptonic 6-decays, i.e. events containing J/ip ^ A*"^/^" decays. The data sample consists 
of ~ 110 pb~^ of pp coUisions collected by CDF during the 1992 — 1995 coUider run. This 
sample has been used for many analyses and is described in detail in Ref . [35] . Approximately 
18% of these J/t/j mesons come from B decays [34]. Muon candidates are selected as in 
Ref. [35] . Since we want to make use of the B lifetime to remove the contribution of prompt 
J/ip mesons, we select muons with SVX tracks. The dimuon invariant mass is calculated 
without constraining the two muon tracks to a common vertex since the mass resolution is 
not important in this check. In addition we require a jet with transverse energy larger than 
15 GeV lying in the hemisphere opposite to the J/ip and contained in the SVX acceptance. 

The dimuon invariant mass distribution in these events is shown in Figure 13. In the 
mass range between 3 and 3.2 GeV/c^ there are 1163 J/t/j events over a background of 1179 
events estimated from the side-band region (see Figure 13) 



^°The request of a recoiling away-jet reduces the number of J ftp mesons in the original data set 
by a factor of ~ 200. 
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FIG. 13. Invariant mass distribution of muon pairs. The shaded area indicates the J /tp sig- 
nal region and the cross-hatched area indicates the side-band region, SB, used to estimate the 
background. 



The J/il) lifetime is defined as 

{L -pt) ■ M 

^ = 2 

c-Pt 

where M and Pt are the dimuon invariant mass and transverse momentum and L is the dis- 
tance between the event vertex and the origin of the muon tracks. The lifetime distribution 
of J/ip candidates is shown in Figure 14. As studied in Ref. [35], prompt J/ip candidates pro- 
duce a symmetric r-distribution peaking at r = 0. We call and ^~ the numbers of J/ip 
candidates with positive and negative lifetime; SB^ and SB~ are the analogous numbers for 
the side-band region, which is used to estimate the background in the invariant mass distribu- 
tion. The number of J/?/' mesons from 5 decays is then A^^ = ip^ —ip' — {SB^ — SB^) =561 
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which is 48% of the initial sample. In the opposite hemisphere we find 572 away-jets. In 
these a-jets we measure the following numbers of tags after mistag removal: 

1. 48.0 ± 15.1 SECVTX tags 

2. 61.7 ± 17.3 JPB tags 

3. -9.4 ± 14.4 SLT tags 




1 III' 'I I ' ' ' iiii'ii'i ' I' 

-4 -2 2 4 6 8 

X (psec) 



FIG. 14. Lifetime distribution of J/V' candidates. 

For 54.8 ± 11.5 hfetime tags (average of the observed number of SECVTX and JPB tags) 
the simulation predicts 8.1 ± 1.7 SLT tags. The observed number of SLT tags is 1.2 a lower 
than the prediction rather than 50% larger as in the inclusive lepton sample. 
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XI. CONCLUSIONS 



We have studied the heavy flavor properties of jets produced at the Tevatron coUider. 
This study is motivated by the evidence, reported in Ref. [1], for a class of jets that contain 
long-hved objects consistent with b- or c-quark decays, identified by the presence of secondary 
vertices (SECVTX tags) or of tracks with large impact parameters (JPB tags), but which also 
have an anomalously large content of soft leptons (SLT tags); we refer to these as superjets 
and supertags. The study in Ref. [1] focused on high-pr jets produced in association with 
W bosons. The analysis reported here uses a much larger data set collected with low-pr 
lepton triggers {pr > 8 GeV/c). This data set has been previously used to study bottom 
and charmed semileptonic decays, and to provide calibrations for the measurement of the 
pair production of top quarks [14] . 

In the present analysis, wc study events having two or more central jets with > 
15 GeV, one of which (lepton-jet) is consistent with a semileptonic bottom or charmed decay 
to a lepton with pt > 8 GeV/c. The measurement is a comparison between the data and a 
HERWiG-based simulation of the semileptonic decay rate for the additional jets (away-jets), 
which have no lepton trigger requirement. We first use measured rates of lepton- and away- 
jets with SECVTX and JPB tags in order to determine the bottom and charmed content of 
the data; we then tune the simulation to match the observed heavy-flavor content. Rates 
of SECVTX and JPB tags and the kinematics of these events are well modeled after tuning 
the parton-level cross sections predicted by HERWIG within the experimental and theoretical 
uncertainties. The tuned parton-level prediction of HERWiG indicates that, in order to model 
the single b production cross section measured at the Tevatron, any theoretical calculation 
should predict higher-order-term contributions which are approximately a factor of three 
larger than the LO contribution. 

We then measure the yields of soft (pr > 2 GeV/c) leptons due to heavy-flavor decays 
in the away-jets, and compare them to the prediction of the tuned simulation. The latter 
depends on the bottom and charmed semileptonic decay rates and on the soft lepton re- 
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construction efficiency. To calibrate tlie predictions of tlie simulation, we perform the same 
analysis on samples of generic jets with 20, 50, and 100 GeV Et thresholds; these samples 
have also been previously used to calibrate the simulation of heavy flavor background to 
pair production of top quarks [14] . 

Finally, with these cahbrations we find that away-jets have a 30 — 50% excess of soft 
lepton tags as compared with the simulation, corresponding to 2.5 — 3.5 cr, depending on the 
selection of the away-jets; the selections include (a) all away-jets, (b) a subset with SECVTX 
tags, and (c) another subset with JPB tags (the three results are highly correlated and should 
not be combined). The size of this excess is consistent with the differences between the NLO 
prediction and the bb cross section measurements at the Tevatron that are based upon the 
detection of one and two leptons from 6-quark decays. A possible interpretation of this 
excess, the one that motivated this study, is the pair production of light scalar quarks with 
a 100% semileptonic branching ratio. Due to the Pt > 8 GeV/c lepton-trigger requirement, 
we expected such a signature to be enhanced in this sample as compared with generic-jet 
data. However, alternative explanations for the excess are not excluded by this study, the 
interpretation of which requires independent confirmations. 
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